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[i]  We  investigate  seismicity  near  faults  in  the  Southern  California  Earthquake  Center 
Community  Fault  Model.  We  search  for  anomalously  large  events  that  might  be  signs  of  a 
characteristic  earthquake  distribution.  We  find  that  seismicity  near  major  fault  zones  in 
Southern  California  is  well  modeled  by  a  Gutenberg-Richter  distribution,  with  no  evidence 
of  characteristic  earthquakes  within  the  resolution  limits  of  the  modem  instrumental 
catalog.  However,  the  b  value  of  the  locally  observed  magnitude  distribution  is  found  to 
depend  on  distance  to  the  nearest  mapped  fault  segment,  which  suggests  that  earthquakes 
nucleating  near  major  faults  are  likely  to  have  larger  magnitudes  relative  to  earthquakes 
nucleating  far  from  major  faults. 
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1.  Introduction 

[2]  It  is  well  known  that  earthquake  magnitudes  within 
large  regions  follow  the  Gutenberg-Richter  (G-R)  distribu¬ 
tion.  The  Gutenberg-Richter  magnitude  distribution  relates 
the  cumulative  number  of  earthquakes  N  above  a  given 
magnitude,  M,  by 

log(JV)  =  a  -  bM,  (1) 

where  a  and  b  are  constants  [Ishimoto  and  Iida,  1939; 
Gutenberg  and  Richter,  1944].  The  b  value  is  generally 
approximately  1  [ Frohlich  and  Davis,  1993],  which  means, 
in  combination  with  constant  stress  drop  scaling  [e.g.,  Aki, 
1972],  that  the  number  of  earthquakes  in  a  given  magni¬ 
tude  range  is  proportional  to  the  reciprocal  of  the  fault  rup¬ 
ture  area.  For  California,  b  =  1  matches  the  modem  catalog 
well  \Felzer,  2008;  Hutton  et  ah,  2010]. 

[3]  While  the  Gutenberg-Richter  distribution  is  used  to 
model  seismicity  in  large  regions,  there  is  some  question  as 
to  whether  it  applies  to  earthquakes  in  individual  fault  zones. 
The  characteristic  magnitude  distribution  [Wesnousky  et  al., 
1983;  Schwartz  and  Coppersmith,  1984;  Wesnousky,  1994] 
alternatively  holds  that  large  earthquakes  in  major  fault 
zones  occur  at  a  higher  rate  relative  to  smaller  earthquakes 
than  the  Gutenberg-Richter  distribution  would  predict.  The 
characteristic  magnitude  distribution  has  been  suggested  in 
part  because  of  an  apparent  mismatch  between  paleoinferred 
rates  of  large  earthquakes  on  major  faults  and  rates  of  smaller 
earthquakes  from  the  instrumental  catalog  for  a  narrow 
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region  surrounding  the  fault.  In  this  work  we  consider  only 
the  modem  instrumental  catalog,  for  which  hypocenters  are 
known  and  magnitudes  are  well  characterized.  This  choice 
will  limit  the  size  of  the  catalog  and  therefore  the  highest 
magnitudes  available;  however,  including  data  from  many 
faults  throughout  California,  rather  than  studying  a  single 
fault  zone,  improves  the  power  of  our  tests  considerably. 

[4]  The  characteristic  magnitude  distribution  is  often 
used  in  seismic  hazard  analysis  [e.g.,  Working  Group  on 
California  Earthquake  Probabilities,  1990a,  1990b,  1995, 
1999;  Field  et  al.,  2008].  However,  the  use  of  the  character¬ 
istic  earthquake  model  can  lead  to  some  difficulty  in  match¬ 
ing  regional  catalog  rates.  On  a  state-wide  basis,  magnitudes 
are  G-R  distributed,  and  it  can  be  difficult  to  produce  an 
overall  catalog  that  matches  the  G-R  distribution  when 
seismicity  on  individual  faults  is  modeled  with  a  charac¬ 
teristic  distribution.  Previous  statewide  hazard  models  for 
California  have  contained  discrepancies  between  historic 
earthquake  rates  and  rates  given  by  the  model  between 
magnitudes  6  and  7  [Field  et  al.,  1999;  Petersen  et  al.,  2000], 
Significant  tinkering  with  model  parameters  has  been 
required  to  alleviate  what  has  colloquially  become  known  as 
the  “battle  of  the  bulge”  [Field  et  al.,  2008]. 

[5]  Inherent  in  the  characteristic  earthquake  hypothesis  is 
a  scaling  break  between  the  large  and  small  events  on  a 
given  fault.  Southern  California  is  a  good  place  to  look  for 
such  a  scaling  break,  if  it  exists,  since  there  are  earthquake 
catalogs  of  well-located  earthquakes  with  well-characterized 
magnitudes  and  digital  models  of  3-D  fault  surfaces  in  the 
region.  We  investigate  the  magnitude  distribution  of  earth¬ 
quakes  near  major  fault  zones  in  Southern  California  to 
determine  if  the  largest  events  in  fault  zones  are  larger  than 
would  be  predicted  by  a  Gutenberg-Richter  distribution.  We 
begin  by  examining  seismicity  near  the  Parkfield  section  of 
the  San  Andreas  Fault  before  extending  our  analysis  to  all 
major  mapped  fault  zones  in  Southern  California.  We  also 
look  for  changes  in  the  magnitude  distribution  with  distance 
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a)  Magnitude  Distribution  for  Parkfield  Section 


Figure  1.  The  cumulative  magnitude  distribution  for  earth¬ 
quakes  within  5  km  of  the  Parkfield  section  of  the  San 
Andreas  Fault  is  shown  in  blue,  (a)  An  analysis  of  the  b 
value  error  alone  could  lead  to  the  erroneous  conclusion  that 
the  largest  events  in  this  zone  violate  Gutenberg-Richter 
(G-R)  behavior,  (b)  However,  random  samples  drawn  from 
a  G-R  distribution  (black  lines)  demonstrate  considerable 
scatter.  The  largest  event  is  within  the  scatter  predicted  from 
random  G-R  samples  and  thus  does  not  violate  the  null 
hypothesis  that  Parkfield  earthquake  magnitudes  are  drawn 
from  a  G-R  distribution  with  a  b  value  of  1 . 

from  major  faidt  zones,  to  see  if  the  catalog  contains  dif¬ 
ferences  between  major  fault  seismicity  and  regional  seis¬ 
micity  that  are  often  assumed  in  seismic  hazard  models. 

2.  Seismicity  Near  Parkfield 

[6]  The  Parkfield  section  of  the  San  Andreas  Fault  has 
been  hypothesized  to  rupture  in  quasiperiodic  “characteris¬ 
tic”  events  of  approximately  magnitude  6  [ Bakun  and  Lindh, 
1985;  Jackson  and  Kagan,  2006].  We  do  not  consider  time 
dependence  in  this  study,  but  focus  instead  on  the  magnitude 
distribution  for  this  fault  section.  Is  there  an  increase  in  M6 
earthquakes  near  Parkfield,  beyond  what  would  be  consis¬ 
tent  with  G-R  statistics? 

[7]  Figure  la  shows  the  cumulative  magnitude  distribu¬ 
tion  for  Parkfield  section  earthquakes.  Events  are  included 
from  the  ANSS  catalog,  1984-2007,  within  5  km  of  the  fault 
trace,  as  defined  by  the  Working  Group  on  California 
Earthquake  Probabilities  [Field  et  al,  2008].  Comparing 
directly  to  a  best  fit  G-R  curve,  the  G-R  distribution  appears 


to  severely  underpredict  the  rate  of  A/6  earthquakes.  Even 
accounting  for  b  value  error  (95%  confidence  bounds  for  the 
b  value  are  determined  by  maximum  likelihood  [Aki,  1965] 
does  not  account  for  this  apparent  overprediction.  It  is  easy 
to  see  from  Figure  la  one  reason  for  the  development  of  the 
characteristic  earthquake  hypothesis. 

[8]  However,  this  simple  analysis  fails  to  account  for  the 
inherent  variability  in  the  tail  of  the  distribution  where 
sampling  error  becomes  important.  We  can  see  this  by 
generating  20  random  sets  of  magnitudes  drawn  from  the 
Gutenberg-Richter  distribution,  each  with  the  same  number 
of  events  as  the  Parkfield  data  set  (Figure  lb).  These  samples 
show  that  variability  in  the  tail  of  the  power  law  distribution 
is  the  rule  rather  than  the  exception.  In  fact,  considering  this 
variability,  the  rate  of  M6  events  in  Parkfield  is  consistent 
with  a  b  =  1  G-R  distribution  at  95%  confidence  (the  exact 
95%  confidence  bounds  for  each  point  in  the  curve  are 
shown  with  the  shading  in  Figure  lb). 

3.  Magnitude  Distributions  in  Individual  Fault 
Zones 

[9]  We  systematically  extend  our  analysis  to  all  major 
mapped  faults  in  Southern  California.  We  assign  earth¬ 
quakes,  in  3D,  to  the  nearest  fault  in  the  Southern  California 
Earthquake  Center  (SCEC)  Community  Fault  Model  (CFM), 
version  3.0.  This  is  similar  to  the  rCFM  earthquake  data¬ 
base  [ Woessner  and  Hauksson,  2006;  Hauksson,  2010] 
which  assigns  earthquakes  to  the  nearest  fault  as  defined  by 
the  rectilinear  CFM  version  2.5.  The  SCEC  CFM  version  3.0 
that  we  use  in  this  analysis  contains  triangulated,  nonplanar 
fault  surfaces.  Like  the  rCFM  catalog,  we  use  events  from 
the  Southern  California  Seismic  Network  from  1981  to  2004 
(inclusive),  relocated  using  a  double-difference  method 
[Hauksson  and  Shearer,  2005].  We  adjust  the  relocated  cat¬ 
alog  in  two  ways:  (1)  we  replace  the  magnitudes  with  more 
recent  magnitudes  from  the  Southern  California  Seismic 
Network  (SCSN),  and  (2)  we  add  missing  events  that  are  in 
the  SCSN  catalog  but  absent  from  the  older  rCFM  database. 
This  gives  a  total  of 26,479  earthquakes  above  magnitude  2.5 
and  within  20  km  of  the  CFM  fault  segments.  Importantly, 
the  revised  data  set  includes  the  1992  A/7.3  Landers  earth¬ 
quake,  which  is  absent  from  the  original  relocated  catalog. 
This  earthquake  is  the  largest  earthquake  in  the  revised  data 
set.  The  addition  of  missing  events  and  the  use  of  newer 
SCSN  magnitudes  do  not  significantly  change  the  results 
we  present  here. 

[10]  We  separate  the  earthquakes  into  bins  on  the  basis  of 
the  fault  zone  to  which  each  is  assigned,  which  is  the  closest 
fault  in  the  CFM  (we  calculate  the  closest  distance  in  3D, 
taking  into  account  the  depths  of  the  events  and  the  non¬ 
planar  fault  sources  of  the  CFM  3.0).  The  faults  themselves 
are  chosen  (“segmented”)  just  as  they  are  defined  in  the 
CFM.  There  is  certainly  some  subjectivity  in  how  segments 
are  defined;  this  cannot  be  avoided,  but  we  do  not  personally 
modify  the  faults  as  defined  by  the  community  consensus 
CFM  representation.  The  largest  earthquakes  in  the  catalog 
may  indeed  rupture  multiple  segments,  or  even  have  hypo- 
centers  located  by  the  catalog  some  distance  from  the  pri¬ 
mary  rupture.  We  consider  hypocenters  only  and  do  not  use 
extended  sources  or  assign  large  earthquakes  to  multiple 
segments;  this  also  prevents  data  selection  on  our  part  as 
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Figure  2.  Epicenters  for  earthquakes  (M  >  2.5,  1981 — 
2004)  within  20  km  of  the  Community  Fault  Model  (CFM) 
3.0  faults  (surface  traces  shown  with  black  lines)  are  colored 
according  to  their  distance,  in  kilometers,  from  the  nearest 
CFM  fault  plane. 

faults  involved  in  particular  events  are  subject  to  debate 
(whereas  hypocenter  distance  to  the  CFM  is  well  defined). 

[11]  Fault  traces  and  epicenters  for  the  earthquakes  in  our 
data  set  are  shown  in  Figure  2.  For  the  following  analysis  we 
include  events  within  5  km  of  each  fault  plane  segment  and 
with  a  magnitude  above  2.5  to  ensure  completeness.  Of  the 
163  faults  in  the  CFM  database,  155  faults  have  associated 
earthquakes  within  5  km  and  above  the  minimum  magnitude. 

[12]  On  the  basis  of  the  magnitude  of  the  largest  earth¬ 
quake  in  each  fault  zone  bin  and  the  number  of  earthquakes 
in  that  bin,  we  can  obtain  a  p  value.  This  gives  the  proba¬ 
bility  of  observing  a  largest  earthquake  at  least  as  extreme  as 
in  that  fault  zone,  provided  that  the  null  hypothesis  is  cor¬ 
rect.  Our  null  hypothesis  is  that  earthquake  magnitudes 
within  each  fault  zone  follow  a  G-R  distribution.  The  p  value 
for  a  set  of  N  earthquakes  with  a  largest  observed  magnitude 

Tfmax  observed  Is 

p=  1  -  (1  -  (2) 

if  the  set  is  complete  down  to  magnitude  Mmin  and  b  =  1. 
Note  that  equation  (2)  neglects  the  upper  magnitude  cutoff 
of  the  G-R  relationship.  This  is  valid  if  there  are  not  enough 
events  in  the  catalog  to  “see”  this  cutoff.  Thus  our  null 
hypothesis  is  that  the  magnitudes  are  selected  from  a  G-R 
distribution  with  a  b  value  of  1,  and  that  the  maximum 
magnitude  cutoff  is  significantly  larger  than  the  largest  event 
in  our  data  set  (which  has  a  magnitude  of  7.3).  A  b  value  of  1 
is  found  to  fit  Southern  California  seismicity  as  a  whole 
[ Hutton  et  al.,  2010], 

[13]  We  also  test  a  p  value  statistic  that  can  incorporate 

spatially  variable  b  values  [see,  e.g.,  Woessner  and  Hauksson, 
2006],  given  by  p  =  1  -  (1  -  The 

p  value  statistic  given  in  equation  (2)  may  have  greater 
power  in  situations  where  an  anomalously  large  event  could 
be  fit  by  relaxing  the  b  value;  however,  if  the  true  b  value 
is  much  greater  than  1.0,  a  truly  anomalous  event  may  be 


missed.  We  apply  the  second  p  value  statistic  to  the 
binned  data,  assuming  the  maximum  likelihood  b  value  for 
each  bin.  However,  we  find  that  this  statistic  does  not 
significantly  change  our  results.  We  will  thus  focus  on  the 
results  for  the  simpler  statistic  in  equation  (2). 

[14]  We  calculate  p  values  for  the  seismicity  in  each  fault 
bin,  with  Mmin  =  2.5  and  assuming  b  =  1.  Only  15  of  the 
155  faults  (9.7%)  have  maximum  observed  earthquake 
magnitudes  beyond  the  90%  confidence  level  (one  sided); 
12  of  the  faults  (7.7%  of  the  faults)  have  maximum  events 
beyond  the  95%  confidence  level.  Furthermore,  only  two 
of  the  faults  have  events  larger  than  the  99%  confidence 
interval  ( p  <  0.01). 

[15]  The  number  of  events  for  each  segment  versus  the 
magnitude  of  the  largest  event  is  shown  in  Figure  3a,  along 
with  confidence  intervals  for  90%,  99%,  and  99.9%.  In 
Figure  3b  we  generate  similar  results  for  synthetic  faults.  We 
synthetically  model  faults  by  drawing  events  randomly  from 
a  G-R  distribution  with  a  b  value  of  1  and  no  upper  magni¬ 
tude  cutoff.  The  synthetic  faults  are  constrained  to  have  the 
same  total-number-of-events  distribution  as  the  real  faults. 
The  largest-event  distribution  of  the  real  faults  and  synthetic 
faults  are  not  significantly  different  (see  Figure  3).  We 
therefore  have  a  null  result:  the  largest  events  in  CFM  fault 
zones  are  not  larger  than  would  be  expected  were  they  pulled 
randomly  from  a  G-R  distribution. 

[16]  Given  that  faults  form  a  complex  network,  and  that 
catalog  events  have  location  errors,  it  is  not  always  clear 
which  earthquakes  should  be  assigned  to  a  given  fault. 
However,  varying  the  distance  we  include  around  a  fault 
surface  has  little  effect  on  the  results.  Including  events 
within  1  km  of  the  fault  plane  and  20  km  of  the  fault  plane, 
for  example,  gives  three  and  two  faults,  respectively,  with 
p  <  1%.  This  number  of  fault  bins  with  p  <  1%  is  not  sta¬ 
tistically  significant  (five  such  fault  bins  would  be  required 
for  this  to  be  statistically  significant  at  one-sided  95% 
confidence). 

[17]  Even  though  the  magnitude  distributions  for  some 
faults  show  events  that  appear  large  by  eye,  this  variation  is 
to  be  expected  for  power  law  distributions.  The  largest 
magnitudes  in  this  data  set  are  well  modeled  by  a  Gutenberg- 
Richter  distribution.  The  faults  with  the  three  largest  maxi¬ 
mum  events  are  shown  in  Figure  4.  The  largest  earthquake  in 
Figure  4a,  which  shows  seismicity  assigned  to  the  Lavic 
Lake  fault,  is  the  1999  Ml .1  Hector  Mine  earthquake.  This 
fault  segment  has  1039  earthquakes,  and  applying  equation 
(2)  gives  a  p  value  of  0.026.  The  1992  Ml 3  Landers  earth¬ 
quake  is  assigned  to  the  Johnson  Valley  Fault  (Figure  4b), 
which  with  763  events  has  a  p  value  of  0.012.  Figure  4c 
shows  the  seismicity  nearest  to  the  Santa  Monica  fault.  For 
this  fault  zone  there  are  38  earthquakes;  the  maximum  event 
in  this  bin  is  the  1994  Northridge  M61  earthquake,  which 
because  of  its  depth  and  the  northward  dip  of  the  Santa 
Monica  fault  is  actually  closer  to  the  Santa  Monica  fault,  as 
defined  by  the  CFM,  than  to  the  Northridge  thrust  fault. 
Because  of  the  small  number  of  earthquakes  in  this  bin  and 
the  large  maximum  event,  the  p  value  for  this  fault  segment 
is  0.0024.  Had  the  Northridge  earthquake  been  assigned  to 
the  Northridge  thrust  fault,  which  contains  706  other  earth¬ 
quakes  (Figure  4d),  many  of  them  aftershocks  of  the  1994 
Northridge  main  shock,  the  p  value  would  be  0.044. 
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a)  Largest  Earthquake  Distribution 


Figure  3.  The  largest-event  distribution  for  (a)  faults  in  Southern  California  and  (b)  synthetic  faults  with 
magnitudes  following  a  G-R  distribution  and  the  same  number-of-events  distribution  as  the  real  faults. 
Each  circle  represents  one  fault  section.  The  black  line  shows  the  G-R  extrapolation  from  small  events,  which 
is  typically  considered  the  G-R  expectation.  In  fact,  as  we  discuss  in  the  text,  this  extrapolation,  which  corre¬ 
sponds  to  the  peak  of  the  maximum  event  distribution  (the  black  curve  shown  in  Figure  5a),  is  below  the  mean 
expectation  for  the  maximum  observed  event.  The  blue,  green,  and  red  lines  show  the  90%,  99%,  and  99.9% 
confidence  bounds,  respectively,  on  the  magnitude  of  the  largest  event,  given  the  number  of  events  assigned 
to  the  fault.  Only  two  of  the  155  faults  are  beyond  the  99%  confidence  bounds,  which  shows  that  on  the 
whole  the  faults  are  not  more  anomalous  in  terms  of  their  largest  events  than  synthetic  faults  with  earth¬ 
quake  magnitudes  sampled  from  a  G-R  distribution. 


[is]  Besides  the  Santa  Monica  fault,  there  is  only  one 
other  fault  bin  that  has  a  p  value  below  0.01.  This  is  the 
Clamshell-Sawpit  Canyon  fault,  which  contains  only  one 
event  above  magnitude  2.5,  the  1991  A/5.8  Sierra  Madre 
earthquake.  While  the  Sierra  Madre  main  shock  is  closest  to 
the  Clamshell  fault  as  defined  by  the  CFM,  the  aftershocks 
of  this  event  are  closest  to  the  Sierra  Madre  fault.  Thus  the 
Clamshell  fault  has  only  one  event  but  a  large  maximum 
event,  which  gives  this  fault  the  largest p  value  in  the  data  set 
of  5.0  x  10  . 

[19]  Even  though  the  largest  event  for  both  of  the  faults 
shown  in  Figure  4  is  much  larger  than  the  G-R  expectation, 
the  fault  zones,  considered  as  a  whole,  do  not  contain 
anomalously  large  events.  We  expect  approximately  this 
proportion  of  faults  to  have  maximum  observed  magnitudes 
higher  than  the  G-R  extrapolation  from  small  magnitudes 
(and  we  would  expect  the  lowest  p  values  to  occur  on  the 
faults  that  contain  the  largest  earthquakes).  This  is  because 
the  largest-event  distribution  for  a  power  law  is  skewed,  as 
shown  in  Figure  5.  While  the  cumulative  rate  of  small  events 
matches  the  G-R  curve  very  well,  at  high  magnitudes  the 
individual  samples  show  much  more  scatter.  The  last  point 


of  each  sample,  when  the  magnitudes  are  plotted  cumula¬ 
tively,  is  more  likely  than  not  to  be  above  a  straight-line 
extrapolation  from  lower  magnitudes.  Put  another  way,  the 
distribution  of  the  last  point,  shown  in  black  in  Figure  5a,  is 
skewed  to  the  right,  and  both  the  median  and  mean  of  this 
largest-event  distribution  are  higher  than  the  mode,  which 
corresponds  to  where  the  G-R  probability  distribution  func¬ 
tion  (plotted  in  red  on  the  same  plot)  intersects  the  x  axis. 
(Analytically  the  black  probability  distribution  function  for 
the  largest  observed  magnitude  is  equivalent  to  the  deriva¬ 
tive  of  p(Mmax  observed)  in  equation  (2).) 

[20]  Thus  it  is  not  unexpected  that  a  magnitude-frequency 
curve,  when  plotted  cumulatively,  will  have  a  perceived 
characteristic  “bump”  in  the  tail  that  is  actually  a  byproduct 
of  power  law  statistics.  In  fact,  power  law  statistics  implies 
that  most  random  draws  from  a  pure  G-R  distribution  will 
have  a  maximum  observed  magnitude  that  is  higher  than  a 
log  linear  extrapolation  of  the  low-magnitude  rate  (for 
example,  the  red  line  in  Figure  5  a)  appears  to  predict. 

[21]  The  extreme  variability  in  the  tail  of  a  power  law  is 
also  evident  in  subsets  of  the  data,  as  pointed  out  by  Howell 
[1985]  and  shown  in  Figure  6.  Sorting  the  data  by  distance 
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a)  Lavic  Lake  Fault  b)  Johnson  Valley  Fault 


Figure  4.  (a-c)  Cumulative  magnitude-frequency  distributions  for  the  three  faults  with  the  largest  earth¬ 
quakes  in  the  data  set.  The  G-R  extrapolation  with  b  =  1  is  shown  in  red.  While  by  eye  these  faults  appear 
to  contain  anomalously  large  events,  on  the  whole  the  largest-event  distribution  among  the  CFM  fault 
sections  is  consistent  with  G-R  behavior,  (d)  The  1994  A/6.7  Northridge  earthquake  is  located  closer  to 
the  Santa  Monica  fault  than  to  the  Northridge  Thrust  fault,  as  defined  by  the  CFM  3.0;  this  results  in  a 
magnitude-frequency  distribution  for  the  Santa  Monica  fault  that  appears  more  anomalous  because  the 
smaller  aftershocks  of  the  Northridge  earthquake  primarily  locate  on  the  Northridge  Thrust.  It  should 
be  noted  that  picking  out  the  faults  with  the  largest  earthquakes  will  naturally  result  in  distributions  that 
appear  to  violate  G-R  behavior;  however,  this  is  a  result  of  data  selection,  and  when  we  analyze  all  the 
faults,  we  find  that  the  largest  earthquakes  do  not  violate  G-R  behavior. 


from  the  CFM  faults  results  in  similar  variability  as  the 
random  subsets  shown  in  Figure  5a.  In  fact,  this  variability  is 
necessary;  that  is,  one  of  the  10  subsets,  on  average,  must 
have  a  largest  event  a  magnitude  unit  higher  than  a  G-R 
extrapolation  from  lower  magnitudes  if  the  total  set  of 
earthquakes  is  to  follow  a  G-R  distribution  as  well  (since  a 
data  set  10  times  the  size  will  have,  on  average,  a  largest 
observed  event  1  magnitude  unit  higher  when  b  =  1).  This 
variability  in  the  tail,  interestingly,  does  not  decrease  with 
more  data  until  the  data  set  is  large  enough  to  be  affected  by 
the  maximum  possible  magnitude  for  that  region.  This  sta¬ 
bility  of  the  largest-event  distribution  with  respect  to  sample 
size  is  shown  in  Figure  5b. 

4.  Magnitudes  on  the  Fault  Versus  in  the  Bulk 

[22]  To  what  extent  are  earthquakes  that  nucleate  on  large, 
mapped  faults  different  than  earthquakes  that  nucleate  on 
smaller  faults  in  the  “bulk”?  Certainly  many  of  the  faults  are 
readily  apparent  in  seismicity  locations;  however,  are  the 
large,  mapped  faults  apparent  from  other  features  of  the 
seismicity,  namely  the  magnitude  distribution?  Although 
the  major  faults  in  California  may  accommodate  much  of  the 
strain  release,  it  is  another  question  whether  large  earth¬ 
quakes  nucleate  near  the  major  faults,  given  the  propensity 
for  faults  to  rupture  together  in  single  ruptures  [  Wes  nous  k)’, 
2008]. 

[23]  The  extent  to  which  magnitudes  are  sensitive  to 
nucleation  location  has  important  implications  for  hazard 
analysis.  If,  for  example,  larger  earthquakes  are  more  likely 


to  nucleate  closer  to  mapped  faults,  this  would  suggest 
increased  hazard  from  potential  foreshocks  located  close  to 
major  faults  relative  to  other  regions  [see,  e.g.,  Agnew  and 
Jones,  1991].  Furthermore,  if  the  magnitude  distribution  is 
sensitive  to  the  size  of  nearby  faults,  it  also  suggests  that  the 
G-R  magnitude  distribution  we  observe  could  be  an  effect  of 
the  fault  network  geometry. 

[24]  The  magnitude  distribution  of  our  catalog  does,  in 
fact,  change  with  distance  from  the  CFM  faults,  as  shown 
in  Figure  6.  The  maximum  likelihood  b  value  for  the  10% 
of  earthquakes  that  are  closest  to  the  fault  is  1.08  ±  0.04 
at  95%  confidence.  By  contrast  the  furthest  bin  from  the 
CFM  faults  has  a  b  value  ranging  from  1 . 1 5  to  1 .24,  at  95% 
confidence.  The  correlation  (we  use  Pearson’s  linear  corre¬ 
lation  in  this  paper)  between  the  b  values  for  the  10  bins 
shown  in  Figure  6a  and  the  distance  of  the  bins  from  the 
CFM  faults  is  statistically  significant.  In  addition,  we  can 
obtain  an  extremely  statistically  significant  result  which 
does  not  rely  on  binning  at  all  by  calculating  the  correlation 
between  the  magnitude  of  each  earthquake  in  the  data  set 
and  its  distance  to  the  closest  CFM  fault.  This  correlation  is 
—0.026  (it  is  negative  because  magnitudes  tend  to  increase 
as  distance  from  the  fault  decreases).  While  this  correlation 
may  seem  small  in  the  absolute  sense  (as  is  to  be  expected 
given  that  the  b  values  are  not  dramatically  different)  it  is, 
in  fact,  significant  at  p  =  3  x  10-5,  as  determined  from  ran¬ 
domly  shuffled  versions  of  the  catalog.  By  increasing  the 
minimum  magnitude  (see  Figure  6b),  we  can  remove  enough 
earthquakes  so  as  to  lose  statistical  significance;  however, 
this  does  not  happen  for  any  Mmin  <3.1. 
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a)  Distribution  of  Maximum  Event  for  1000  Randomly  Sampled  G-R  Magnitudes 


Magnitude  (M) 


b)  Distribution  of  Maximum  Event  for  N  Randomly  Sampled  G-R  Magnitudes 


Figure  5.  (a)  One  hundred  samples  of  1000  random  G-R  events  are  shown  in  blue,  and  input  G-R  dis¬ 
tribution  is  shown  in  red  (note  that  the  G-R  distribution  function  extends  below  the  x  axis).  As  expected, 
the  largest  deviations  from  the  red  line  are  at  the  high  magnitudes.  Furthermore,  the  distribution  of  the 
maximum  observed  event  in  each  sample  (black)  is  a  skewed  distribution.  Even  though  the  maximum 
of  the  distribution  agrees  with  the  red  G-R  curve,  both  the  mean  and  median  of  the  distribution  are  higher. 
This  shows  an  important  fact  of  power  law  statistics:  it  is  more  likely  than  not  that  the  largest  event  in 
a  G-R  sample  will  be  larger  than  an  extrapolation  from  rates  of  smaller  earthquakes,  (b)  The  variation 
from  sample  to  sample  in  the  largest  event  is  stable  for  samples  with  AS  10  events.  Thus,  provided  that 
there  are  not  enough  events  to  sample  the  maximum  possible  magnitude,  obtaining  more  data  does  not 
reduce  the  scatter  expected  in  the  tail  of  the  distribution. 


[25]  One  effect  that  could  be  causing  the  b  value  change  is 
short-term  catalog  incompleteness  following  large  earth¬ 
quakes.  Since  we  are  testing  for  nonstationarity  in  the  mag¬ 
nitude  distribution  function,  it  is  extremely  important  that 
any  time  intervals  included  in  the  catalog  be  complete  down 
the  cutoff  magnitude,  which  we  choose  to  be  Mmin  =  2.5. 
It  is  well  known  that  because  of  aftershocks  and  coda  waves, 
catalogs  are  incomplete  immediately  following  earthquakes. 
This  phenomenon  is  known  as  short-term  aftershock 


incompleteness  [Kagan,  2004].  We  account  for  this  known 
effect  by  removing  time  periods  of  the  catalog  following 
each  event;  the  amount  of  time  removed  depends  on  the 
magnitude  of  the  event. 

[26]  Events  are  removed  within  a  time  interval  of 


f  10(M-M^4.5)/0.76  days 

[30  sec  ’ 


(3) 
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a)  Magnitude  Distributions  for  Earthquakes  Sorted  by  Distance  to  CFM 


b)  Maximum  Likelihood  b-value  as  a  Function  of  Minimum  Magnitude 


Minimum  Magnitude 


Figure  6.  (a)  All  earthquakes  within  20  km  of  the  CFM  faults  (black)  are  sorted  by  distance  from  the 
nearest  CFM  fault  and  then  divided  into  10  sets  of  equal  size.  While  the  variation  in  the  tails  of  the  subset 
magnitude  distributions  is  not  unusual,  there  is  variation  in  the  magnitude  distribution  with  distance  from 
the  faults  that  can  be  seen.  The  nonuniform  b  values  of  these  subsets  as  well  as  the  correlation  between  the 
magnitude  and  distance  to  the  CFM  are  both  statistically  significant,  (b)  The  b  values  for  the  10  subsets  as 
a  function  of  minimum  magnitude  are  shown.  The  95%  confidence  error  bars  are  shown  for  the  closest  bin 
to  the  faults;  these  error  bars  are  approximately  the  same  width  for  the  other  bins  since  each  bin  contains 
an  equal  number  of  earthquakes.  The  correlation  between  magnitude  and  distance  from  the  CFM  faults  is 
statistically  significant  for  a  minimum  magnitude  between  2.5  and  3.1. 


following  each  earthquake  with  magnitude  M.  The  top 
expression  is  taken  from  [Helmstetter  et  al.,  2006].  In  addi¬ 
tion,  for  conservatism,  we  have  added  a  minimum  time 
window  removal  of  30  sec  for  all  events.  This  leaves  only 
2438  earthquakes  in  the  catalog;  however,  magnitude  is  still 
observed  to  be  negatively  correlation  with  distance  to  the 
CFM  ( p  =  3.8  x  10~4).  The  relation  of  Helmstetter  et  al. 
[2006]  was  developed  using  many  aftershock  sequences 
and  thus  may  be  an  underestimate  for  particular  sequences, 
as  shown  by  Woessner  et  al.  [2011],  However,  even  a  more 
strict  criterion  that  removes  a  time  interval  of  lOt  for  each 
event,  as  defined  by  equation  (3),  leaves  a  statistically 


significant  negative  correlation.  Therefore  we  conclude  that 
the  observed  b  value  change  is  not  caused  by  short-term 
aftershock  incompleteness. 

[27]  It  should  be  noted  that  the  CFM  faults  have  been 
established  using  seismicity  (in  addition  to  surface  traces, 
seismic  reflection  profiles,  and  wellbore  data)  [Plesch  et  al, 
2007].  If  many  of  the  faults  in  the  CFM  are  drawn  such  that 
they  are  close  to  the  largest  earthquakes  in  the  catalog,  then 
it  is  not  surprising  that  the  magnitude  distribution  would 
change  with  distance  from  the  faults.  However,  this  result  is 
just  as  significant  when  we  restrict  our  catalog  to  M  <  4. 
Therefore  we  can  conclude  that  the  largest  earthquakes  in 
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the  catalog  ( M  >  4)  are  not  driving  this  result.  The  best  test 
to  ensure  that  there  is  no  circularity  between  the  develop¬ 
ment  of  the  CFM  and  changes  in  the  magnitude  distribution 
with  respect  to  it  would  be  to  perform  these  tests  on  the 
portion  of  the  catalog  developed  post-CFM.  Unfortunately, 
this  version  of  the  CFM  was  last  updated  in  January  2004, 
which  is  roughly  coincident  with  the  end  of  our  catalog. 
Newer  relocated  catalogs  are  not  yet  available;  in  the  future 
an  updated  catalog  will  allow  one  to  test  whether  this  effect 
is  observed  in  seismicity  not  used  to  develop  the  CFM. 

5.  Discussion 

[28]  We  observe  highly  statistically  significant  changes  in 
b  value  with  distance  from  the  major  faults  in  Southern 
California,  as  defined  by  the  CFM  3.0.  If  these  observed 
b  value  changes  are  persistent  in  time  (and,  importantly,  are 
not  caused  by  the  process  by  which  the  CFM  is  defined)  and 
extend  up  to  high  magnitudes  then  it  suggests  that  earth¬ 
quakes  nucleating  near  major  faults  have  greater  potential  to 
become  large  earthquakes.  Importantly,  this  b  value  change 
is  a  different  phenomenon  than  spatial  maximum-magnitude 
variation;  it  suggests  that  even  if  the  maximum  possible 
magnitude  is  sufficiently  large,  it  is  still  less  likely  that  an 
earthquake  nucleating  10-20  km  from  a  major  fault  in 
Southern  California  to  be  large  in  magnitude  compared  to  an 
earthquake  nucleating  within,  for  example,  1  km  from  a 
major  mapped  fault.  This  result  also  suggests  that  the  lengths 
of  local  faults  influence  the  magnitude  distribution  (to  the 
extent  that  the  faults  represented  in  the  CFM  are  the  longest 
and  most  well  defined),  which  is  surprising  given  that  faults 
are  known  to  “link  up”  as  a  fault  network,  as  evidenced  by 
earthquakes  that  rupture  multiple  faults. 

[29]  The  more  complex,  nonplanar  fault  surfaces  as 
defined  in  the  CFM  3.0  are  necessary  to  clearly  see  the 
observed  b  value  variation.  The  changes  in  b  values  observed 
between  different  bins  when  earthquakes  are  sorted  with 
distance  from  the  CFM,  as  shown  in  Figure  6,  are  not  sta¬ 
tistically  significant  for  the  same  earthquakes  sorted  by  dis¬ 
tance  to  the  CFM  version  2.5,  although  the  correlation 
between  magnitude  and  distance  from  the  fault  for  the 
unbinned  data  is  borderline  statistically  significant  (this  test 
has  greater  power). 

[30]  Interestingly,  while  we  do  see  evidence  of  fault 
geometry  influencing  the  magnitude  distribution,  we  do  not 
see  evidence  of  non-G-R  behavior  for  the  largest  earth¬ 
quakes.  Could  faults  have  characteristic  behavior  beyond 
the  magnitudes  available  in  the  instrumental  catalog?  We 
limited  our  analysis  to  the  modem  catalog,  which  does  limit 
our  ability  to  constrain  the  magnitude  distribution  at  magni¬ 
tudes  greater  than  7.  However,  if  the  regional  G-R  relation¬ 
ship  is  a  result  of  the  power  law  distribution  of  fault  lengths, 
as  suggested  by  Wesnousky  [1999],  this  suggests  that  smaller 
faults  have  characteristic  behavior  at  smaller  magnitudes. 
Our  data  set  contains  segments  from  long  faults  such  as  the 
San  Andreas  fault,  as  well  as  far  smaller  faults;  still,  we  see 
no  evidence  of  anomalously  large  events  for  any  of  the  fault 
zones. 

[31]  It  seems  that  fault  geometry  does  influence  the  magni¬ 
tude  distribution,  but  through  the  b  value  (which  changes  the 
rates  at  all  magnitudes)  rather  than  at  characteristic  magni¬ 
tudes.  Magnitude  distributions  may  appear  “characteristic” 


by  eye,  but  this  is,  in  fact,  due  to  the  large  intrinsic  vari¬ 
ability  of  samples  from  a  power  law  distribution.  This  work 
shows  that  the  available  data  is  consistent  with  the  null 
hypothesis  of  G-R  scaling  near  major  faults. 

6.  Conclusion 

[32]  Many  seismic  hazard  products  rely  on  the  assumption 
that  earthquakes  that  nucleate  on  a  major  fault  are  different 
(e.g.,  likely  to  be  larger  in  magnitude)  than  those  that 
nucleate  “in  the  bulk”  (i.e.,  on  smaller,  unmapped  faults). 
We  do,  in  fact,  see  changes  in  the  magnitude  distribution 
with  distance  from  the  major  faults,  however,  they  are  not 
of  the  “characteristic”  variety  typically  included  in  such 
models.  We  see  evidence  for  changes  in  b  value  but  do  not 
see  evidence  for  non-G-R  behavior  for  the  largest  events. 
Still,  these  changes  in  b  value,  although  small,  can  have  a 
large  effect  on  rates  at  high  magnitudes  and  are  therefore 
important  for  seismic  hazard  analysis. 
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